ScaffMatch: Scaffolding Algorithm Based on Maximum Weight Matching

نویسندگان

  • Igor Mandric
  • Alex Zelikovsky
چکیده

MOTIVATION Next-generation high-throughput sequencing has become a state-of-the-art technique in genome assembly. Scaffolding is one of the main stages of the assembly pipeline. During this stage, contigs assembled from the paired-end reads are merged into bigger chains called scaffolds. Because of a high level of statistical noise, chimeric reads, and genome repeats the problem of scaffolding is a challenging task. Current scaffolding software packages widely vary in their quality and are highly dependent on the read data quality and genome complexity. There are no clear winners and multiple opportunities for further improvements of the tools still exist. RESULTS This article presents an efficient scaffolding algorithm ScaffMatch that is able to handle reads with both short (<600 bp) and long (>35 000 bp) insert sizes producing high-quality scaffolds. We evaluate our scaffolding tool with the F score and other metrics (N50, corrected N50) on eight datasets comparing it with the most available packages. Our experiments show that ScaffMatch is the tool of preference for the most datasets. AVAILABILITY AND IMPLEMENTATION The source code is available at http://alan.cs.gsu.edu/NGS/?q=content/scaffmatch. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sequence analysis ScaffMatch: scaffolding algorithm based on maximum weight matching

Motivation: Next-generation high-throughput sequencing has become a state-of-the-art technique in genome assembly. Scaffolding is one of the main stages of the assembly pipeline. During this stage, contigs assembled from the paired-end reads are merged into bigger chains called scaffolds. Because of a high level of statistical noise, chimeric reads, and genome repeats the problem of scaffolding...

متن کامل

On the inverse maximum perfect matching problem under the bottleneck-type Hamming distance

Given an undirected network G(V,A,c) and a perfect matching M of G, the inverse maximum perfect matching problem consists of modifying minimally the elements of c so that M becomes a maximum perfect matching with respect to the modified vector. In this article, we consider the inverse problem when the modifications are measured by the weighted bottleneck-type Hamming distance. We propose an alg...

متن کامل

Efficient algorithms for maximum weight matchings in general graphs with small edge weights

Let G = (V,E) be a graph with positive integral edge weights. Our problem is to find a matching of maximum weight in G. We present a simple iterative algorithm for this problem that uses a maximum cardinality matching algorithm as a subroutine. Using the current fastest maximum cardinality matching algorithms, we solve the maximum weight matching problem in O(W √ nm logn(n /m)) time, or in O(Wn...

متن کامل

A distributed-memory approximation algorithm for maximum weight perfect bipartite matching

We design and implement an efficient parallel approximation algorithm for the problem of maximum weight perfect matching in bipartite graphs, i.e. the problem of finding a set of non-adjacent edges that covers all vertices and has maximum weight. This problem differs from the maximum weight matching problem, for which scalable approximation algorithms are known. It is primarily motivated by fin...

متن کامل

Scaling algorithms for approximate and exact maximum weight matching

The maximum cardinality and maximum weight matching problems can be solved in time Õ(m √ n), a bound that has resisted improvement despite decades of research. (Here m and n are the number of edges and vertices.) In this article we demonstrate that this “m √ n barrier” is extremely fragile, in the following sense. For any > 0, we give an algorithm that computes a (1 − )-approximate maximum weig...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 31 16  شماره 

صفحات  -

تاریخ انتشار 2015